---
title: "Appendix: Fixed Effects Analysis in R"
author: Jonathan Mummolo and Erik Peterson
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,tidy.opts=list(width.cutoff=50),tidy=TRUE)
```

The following code implements our recommendations for analyzing and interpreting fixed effects regression models. We will use Snyder and Strömberg (2010), discussed in the main text, as a working example.

## 1. Isolate Relevant Variation

First set your working directory and load the data.

```{r load data}
##Set Working Directory here using setwd()
load("ss2.Rdata")
##rename dataframe
d<-congrecall
```

The following code replicates the second model in Table 4, Column 1 in Snyder and Strömberg (2010). The outcome is an indicator for whether an individual recalled their congressional representative's name, and the treatment, "cov2c," is media market congruence (range 0 to 1, continuous). The reported coefficient on the treatment in the published article is 0.28.

```{r reg1}
##estimate model of Name Recall with year and incumbent fixed effects
reg<-lm(cong_recall ~ cov2c + factor(year) + factor(incnum), data=d)
##view first two coefficients
summary(reg)$coefficients[1:2,]
##store estimation data
est<-reg$model
##make an object that is the variable name of the treatment
x<-"cov2c"
##store the coefficient estimate
b<-reg$coefficients[x]
```

We have replicated the result. Let's isolate the variation in the treatment being used to estimate this coefficient by residualizing the treatment with respect to the fixed effects.

```{r isolate variation}
x.resid<-lm(cov2c~factor(year) + factor(incnum), data=d)$residuals
```
If this is the same variation being used to estimate the coefficient of interest, then we should be able to regress the outcome on this transformed treatment and get the same coefficient.

```{r same result}
reg2<-lm(est$cong_recall ~x.resid)
summary(reg2)$coefficients[1:2,]
```

We indeed get the same result. Note that we did not residualize the outcome, but if we did, we would also get the same coefficient on the treatment.

```{r y resid}
y.resid<-lm(cong_recall ~factor(year)+factor(incnum), data=d)$residuals
reg3<-lm(y.resid~x.resid)
summary(reg3)$coefficients[1:2,]
```


## 2. Identify a Plausible Shift in X

Now that we have isolated the relevant variation in the treatment and estimated the coefficient of interest, we need to determine what number to multiply the coefficient by to judge whether the treatment effect is substantively significant. The coefficient itself represents the effect of a one-unit shift in the treatment, which in this case is a shift from the theoretical minimum to the theoretical maximum of the variable. Is this a plausible shift?

```{r range}
range(est[,x])
sdx<-sd(est[,x])
sdx
sdxresid<-sd(x.resid)
sdxresid
sdx/sdxresid
##compute % difference
((sdxresid-sdx)/sdx)*100
```

From this basic analysis we learn that a 0-to-1 shift is beyond the observed range of the data [.019, .968]. Further, while the standard deviation of the original treatment is 0.29, the standard deviation of the transformed treatment used during estimation is 0.068, a reduction of 76%.

Let's visualize this reduction in variance. Overlapping density plots are useful for this. We will also center both variables on zero by demeaning them to make for an easier comparison of their spreads.


```{r plot 1}
plot(density(x= est[,x]-mean(est[,x]), bw=.01),  col="black", cex=.3, lty=2, main="Treatment Distribution", xlab="Treatment (Centered on Zero)", ylim=c(0,25),  axes=F)
lines(density(x= x.resid-mean(x.resid), bw=.01), col="red", cex=.7, xlab="")
text(x=.15, y=22, labels="Distribution\nafter Fixed\nEffects", col="red")
text(x=.15,y=18, labels=paste("SD=",round(sdxresid,3),sep=""), cex=1, col="red" )
text(x=.3, y=6, labels="Original\nDistribution", col="black")
text(x=.3,y=3, labels=paste("SD=",round(sdx,3),sep=""), cex=1 )
axis(1, seq(-1,1, by=.2))
axis(2, seq(0,40, by=10), las=2)
```

The plot makes very clear that a drastic reduction in variation has occurred as a result of the fixed effects being employed.

It is also useful to examine the distribution of within-unit ranges when trying to identify plausible counterfactuals. Let's generate that distribution and plot it as well. (Note: By not simultaneously accounting for the year and incumbent fixed effects, the variation plotted in the histogram below deviates slightly from the variation used to estimate the model, but is included to help motivate unit-based counterfactuals.)

```{r ranges}
bounds<-list(NA)
ranges<-NA
units<-unique(est[,"factor(incnum)"])

for(i in 1:length(units)){
  
  bounds[[i]]<-range(est[est[,"factor(incnum)"]==units[i],"cov2c"])
  ranges[i]<-bounds[[i]][2] - bounds[[i]][1]
  
}



hist(ranges,col="grey", main="Within-Unit Ranges\nof Treatment", xlab="Within-Incumbent Ranges", ylim=c(0,1000),xlim=c(0,1),axes=F)
axis(1, seq(0,1, by=.2))
axis(2, seq(0,1000, by=200), las=2)
abline(v=mean(ranges), lty=2)
abline(v=quantile(ranges, probs=.95), lty=2)
abline(v=1, lty=2)
text(x=.18,y=900,labels="mean")
text(x=.7,y=900,labels="95th\npercentile")
text(x=.7,y=500,labels="counterfactual\ndiscussed in text")
arrows(x0=.7,x1=1, y0=420, y1=420, length=.1)
arrows(x0=.18,x1=mean(ranges), y0=855, y1=855, length=.1)
arrows(x0=.7,x1=quantile(ranges, probs=.95), y0=810, y1=810, length=.1)
```

As the histogram shows, the range of the treatment in the bulk of units is concentrated near zero, with a mean of roughly 0.05 and a 95th percentile of about 0.36. 

With the above quantities in hand, researchers can now decide which counterfactual shift in the treatment they would like to discuss. They can also provide a sense of the plausibility of that counterfactual shift by describing where in the within-unit distribution the chosen shift is located.

```{r shifts}
##A within-unit SD shift
b*sdx
##A shift the size of the mean within-unit range
b*mean(ranges)
##A shift the size of the 95th percentile in within-unit ranges
b*quantile(ranges, probs=.95)
## A shift the size of the maximum within-unit range
b*max(ranges)
##How many ranges are zero?
mean(I(ranges==0))
```

In addition, we learn that 44% of the within-unit ranges are zero. 


##References
Snyder, James M. and David Strömberg. 2010. “Press Coverage and Political Accountability.” _Journal of Political Economy_, 118(2), 355-408.
